Normal Inverse Gaussian Distribution (norminvgauss)#

The Normal Inverse Gaussian (NIG) distribution is a flexible continuous distribution on (\mathbb{R}) that can capture heavy tails and skewness.

A key intuition is that NIG is a normal mean–variance mixture:

  • first draw a positive random variance (V) from an inverse Gaussian distribution

  • then draw (X\mid V) from a normal distribution with that random variance

This mixture view explains why NIG is popular in applications (e.g. financial returns, turbulence, stochastic volatility) and also gives a clean NumPy-only sampling algorithm.

Important naming pitfall: this is not the Normal-Inverse-Gamma distribution used in Bayesian conjugacy.

Notebook roadmap#

  1. Title & classification

  2. Intuition & motivation

  3. Formal definition (PDF/CDF)

  4. Moments & properties

  5. Parameter interpretation

  6. Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)

  7. Sampling & simulation (NumPy-only)

  8. Visualization (PDF, CDF, Monte Carlo)

  9. SciPy integration (scipy.stats.norminvgauss)

  10. Statistical use cases

  11. Pitfalls

  12. Summary

import math

import numpy as np

import scipy
from scipy import special, stats

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

SEED = 7
rng = np.random.default_rng(SEED)

np.set_printoptions(precision=6, suppress=True)

print("python", __import__("sys").version.split()[0])
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
python 3.12.9
numpy  1.26.2
scipy  1.15.0
plotly 6.5.2

Prerequisites & notation#

Prerequisites

  • comfort with PDFs/CDFs and expectation/variance

  • basic calculus (differentiation under the integral sign is helpful, but not required)

  • familiarity with moment generating functions (MGFs) / characteristic functions

Notation (common in the literature)

We will mainly use the ((\alpha,\beta,\delta,\mu)) parameterization:

  • (\alpha > 0): tail / steepness parameter

  • (\beta\in\mathbb{R}): asymmetry (skew) parameter, with constraint (|\beta| < \alpha)

  • (\delta > 0): scale parameter

  • (\mu\in\mathbb{R}): location parameter

Define [ \gamma = \sqrt{\alpha^2 - \beta^2} ;>; 0. ]

SciPy parameterization

SciPy exposes NIG as scipy.stats.norminvgauss(a, b, loc=0, scale=1) with constraints (a>0) and (|b|\le a). The mapping to ((\alpha,\beta,\delta,\mu)) is:

[ a = \alpha,\delta,\qquad b = \beta,\delta,\qquad \text{loc}=\mu,\qquad \text{scale}=\delta. ]

So if you think in ((\alpha,\beta,\delta,\mu)), you pass a=alpha*delta, b=beta*delta to SciPy.

1) Title & Classification#

  • Name: norminvgauss (Normal Inverse Gaussian / NIG)

  • Type: continuous

  • Support: (x \in \mathbb{R})

  • Parameter space (((\alpha,\beta,\delta,\mu)) form): [ \alpha>0,\quad \delta>0,\quad \mu\in\mathbb{R},\quad \beta\in\mathbb{R}\ \text{with}\ |\beta|<\alpha. ]

(SciPy form: (a>0), (|b|\le a), scale>0, loc real; the boundary (|b|=a) corresponds to (\gamma=0) and is numerically delicate.)

2) Intuition & Motivation#

What this distribution models#

NIG is often used when you want a model that is:

  • roughly bell-shaped, but

  • has heavier tails than a normal distribution (more extreme events), and/or

  • has skewness (asymmetric left vs right tails).

A powerful intuition is the mixture representation:

[ X \mid V ;\sim; \mathcal{N}(\mu + \beta V,; V), \qquad V \sim \text{Inverse-Gaussian}\left(\nu=\tfrac{\delta}{\gamma},;\lambda=\delta^2\right), ]

where (V>0) is random.

  • Randomizing the variance creates heavy tails.

  • The term (\beta V) randomizes the mean in a correlated way, creating skewness.

Typical real-world use cases#

  • Finance: log-returns (skew + heavy tails), NIG Lévy process increments

  • Stochastic volatility / time change models: Brownian motion evaluated at inverse-Gaussian random time

  • Signal processing: impulsive / heavy-tailed noise

  • Environmental / turbulence data: heavy-tailed fluctuations

Relations to other distributions#

  • Generalized hyperbolic family: NIG is a special case (with (\lambda=-\tfrac12)).

  • Normal + inverse Gaussian: NIG is a normal mean–variance mixture with IG mixing.

  • Normal limit (heuristic): for large (\alpha) (with other parameters scaled appropriately), tails become lighter and the distribution becomes closer to normal.

A practical mental model:

NIG behaves like a normal distribution whose variance (and drift) jitters randomly from sample to sample.

3) Formal Definition#

PDF (((\alpha,\beta,\delta,\mu)) parameterization)#

Let (X \sim \text{NIG}(\alpha,\beta,\delta,\mu)) with (\alpha>0), (\delta>0), and (|\beta|<\alpha). Define (\gamma = \sqrt{\alpha^2-\beta^2}).

The PDF is

[ f(x\mid\alpha,\beta,\delta,\mu)#

\frac{\alpha,\delta}{\pi,\sqrt{\delta^2 + (x-\mu)^2}} ;K_1!\left(\alpha,\sqrt{\delta^2 + (x-\mu)^2}\right) ;\exp!\left(\delta,\gamma + \beta,(x-\mu)\right), ]

where (K_1) is the modified Bessel function of the second kind of order 1.

PDF (SciPy “standardized” form)#

SciPy defines a standardized density

[ f(y; a,b) = \frac{a,K_1!\left(a\sqrt{1+y^2}\right)}{\pi,\sqrt{1+y^2}} \exp!\left(\sqrt{a^2-b^2} + b y\right), \qquad y\in\mathbb{R}. ]

and then applies a location-scale transform (y=(x-\text{loc})/\text{scale}).

CDF#

There is no simple elementary closed form for the CDF. It is defined by the integral

[ F(x) = \mathbb{P}(X\le x) = \int_{-\infty}^{x} f(t),dt, ]

and in practice it is computed numerically (e.g. scipy.stats.norminvgauss.cdf).

def nig_validate(alpha: float, beta: float, delta: float) -> None:
    if not (alpha > 0):
        raise ValueError("alpha must be > 0")
    if not (delta > 0):
        raise ValueError("delta must be > 0")
    if not (abs(beta) < alpha):
        raise ValueError("need |beta| < alpha so gamma = sqrt(alpha^2 - beta^2) is real")


def nig_gamma(alpha: float, beta: float) -> float:
    nig_validate(alpha, beta, delta=1.0)
    return float(math.sqrt(alpha * alpha - beta * beta))


def nig_logpdf(x: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
    '''Log-PDF using a numerically stable Bessel-K computation.

    Uses scipy.special.kve(1, z) = exp(z) * K_1(z) to avoid underflow for large z.
    '''
    nig_validate(alpha, beta, delta)
    x = np.asarray(x, dtype=float)

    xm = x - mu
    s2 = delta * delta + xm * xm
    s = np.sqrt(s2)

    gamma = math.sqrt(alpha * alpha - beta * beta)
    z = alpha * s

    # log K1(z) via scaled Bessel: K1(z) = exp(-z) * kve(1,z)
    log_k1 = np.log(special.kve(1.0, z)) - z

    return (
        math.log(alpha * delta)
        - math.log(math.pi)
        - np.log(s)
        + delta * gamma
        + beta * xm
        + log_k1
    )


def nig_pdf(x: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
    return np.exp(nig_logpdf(x, alpha, beta, delta, mu))


def to_scipy_params(alpha: float, beta: float, delta: float, mu: float) -> tuple[float, float, float, float]:
    '''Map (alpha,beta,delta,mu) to SciPy's (a,b,loc,scale).'''
    nig_validate(alpha, beta, delta)
    a = alpha * delta
    b = beta * delta
    return float(a), float(b), float(mu), float(delta)


def from_scipy_params(a: float, b: float, loc: float, scale: float) -> tuple[float, float, float, float]:
    '''Map SciPy's (a,b,loc,scale) to (alpha,beta,delta,mu).'''
    if not (scale > 0):
        raise ValueError("scale must be > 0")
    alpha = a / scale
    beta = b / scale
    delta = scale
    mu = loc
    nig_validate(alpha, beta, delta)
    return float(alpha), float(beta), float(delta), float(mu)


# Quick sanity check: our PDF matches SciPy's parameter mapping
from scipy.stats import norminvgauss

alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)

xg = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 400)
max_abs_diff = float(np.max(np.abs(rv.pdf(xg) - nig_pdf(xg, alpha, beta, delta, mu))))
max_abs_diff
7.771561172376096e-16

4) Moments & Properties#

A convenient starting point is the cumulant generating function (CGF) (K(t)=\log M_X(t)), where (M_X(t)=\mathbb{E}[e^{tX}]) is the MGF.

MGF / CGF#

For (t) in the interval where (|\beta+t|<\alpha):

[ M_X(t) = \exp\left(\mu t + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\big)\right), \qquad \gamma = \sqrt{\alpha^2-\beta^2}. ]

Equivalently,

[ K(t) = \mu t + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\big). ]

Domain: (t \in (-\alpha-\beta,; \alpha-\beta)).

Characteristic function#

For real (u),

[ \varphi_X(u) = \mathbb{E}[e^{iuX}] = \exp\left(i\mu u + \delta\big(\gamma - \sqrt{\alpha^2-(\beta+iu)^2}\big)\right). ]

Mean, variance, skewness, kurtosis#

Using derivatives of (K(t)) at (t=0):

  • Mean: (\mathbb{E}[X] = \mu + \delta,\beta/\gamma)

  • Variance: (\mathrm{Var}(X) = \delta,\alpha^2/\gamma^3)

  • Skewness: (\text{skew}(X) = \dfrac{3,\beta}{\alpha,\sqrt{\delta,\gamma}})

  • (Excess) kurtosis: (\text{excess kurt}(X) = \dfrac{3,(1+4\beta^2/\alpha^2)}{\delta,\gamma}) (so kurtosis (=3+) excess)

Entropy#

The differential entropy is

[ H(X) = -\mathbb{E}[\log f(X)] = -\int_{-\infty}^{\infty} f(x)\log f(x),dx. ]

There is no simple closed form in elementary functions; in practice you estimate it numerically (quadrature or Monte Carlo).

Other properties (high level)#

  • Infinitely divisible: NIG defines a Lévy process (useful for increments / time series).

  • Closure under convolution (with common (\alpha,\beta)): if (X_1\sim\text{NIG}(\alpha,\beta,\delta_1,\mu_1)) and (X_2\sim\text{NIG}(\alpha,\beta,\delta_2,\mu_2)) are independent, then (X_1+X_2\sim\text{NIG}(\alpha,\beta,\delta_1+\delta_2,\mu_1+\mu_2)).

def nig_mean(alpha: float, beta: float, delta: float, mu: float) -> float:
    nig_validate(alpha, beta, delta)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    return float(mu + delta * beta / gamma)


def nig_var(alpha: float, beta: float, delta: float) -> float:
    nig_validate(alpha, beta, delta)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    return float(delta * alpha * alpha / (gamma**3))


def nig_skew(alpha: float, beta: float, delta: float) -> float:
    nig_validate(alpha, beta, delta)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    return float(3.0 * beta / (alpha * math.sqrt(delta * gamma)))


def nig_excess_kurt(alpha: float, beta: float, delta: float) -> float:
    nig_validate(alpha, beta, delta)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    return float(3.0 * (1.0 + 4.0 * (beta * beta) / (alpha * alpha)) / (delta * gamma))


def nig_mgf(t: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
    '''MGF evaluated on an array t (real); returns nan outside the domain.'''
    nig_validate(alpha, beta, delta)
    t = np.asarray(t, dtype=float)
    gamma = math.sqrt(alpha * alpha - beta * beta)

    inside = alpha * alpha - (beta + t) ** 2
    out = np.full_like(t, np.nan, dtype=float)
    mask = inside > 0
    out[mask] = np.exp(mu * t[mask] + delta * (gamma - np.sqrt(inside[mask])))
    return out


def nig_cf(u: np.ndarray, alpha: float, beta: float, delta: float, mu: float) -> np.ndarray:
    '''Characteristic function for real u.'''
    nig_validate(alpha, beta, delta)
    u = np.asarray(u, dtype=float)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    inner = np.sqrt(alpha * alpha - (beta + 1j * u) ** 2)
    return np.exp(1j * mu * u + delta * (gamma - inner))


alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
{
    "mean": nig_mean(alpha, beta, delta, mu),
    "var": nig_var(alpha, beta, delta),
    "skew": nig_skew(alpha, beta, delta),
    "excess_kurt": nig_excess_kurt(alpha, beta, delta),
}
{'mean': 0.10531231768391913,
 'var': 0.5644389450812155,
 'skew': 0.569429411031021,
 'excess_kurt': 1.4878339661647202}
# Entropy (Monte Carlo estimate): H(X) = -E[log f(X)]
# We'll reuse the NumPy-only sampler later, but SciPy's sampler works too.

from scipy.stats import norminvgauss

alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)

x = rv.rvs(size=50_000, random_state=rng)
entropy_mc = float(-np.mean(nig_logpdf(x, alpha, beta, delta, mu)))
entropy_mc
1.2272073243850334

5) Parameter Interpretation#

Think in terms of ((\alpha,\beta,\delta,\mu)):

  • (\mu) (location) shifts the distribution left/right.

  • (\delta) (scale) stretches/compresses the distribution.

  • (\beta) (asymmetry) controls skewness:

    • (\beta>0): heavier/right tail (more mass on the right)

    • (\beta<0): heavier/left tail

  • (\alpha) (tail/steepness) controls tail heaviness:

    • larger (\alpha) (\Rightarrow) lighter tails (closer to normal)

    • smaller (\alpha) (\Rightarrow) heavier tails (more extremes)

The constraint (|\beta|<\alpha) ensures (\gamma=\sqrt{\alpha^2-\beta^2}) is real.

A useful derived quantity:

  • (\gamma) appears in the mean/variance and in the MGF domain.

  • As (|\beta|\to\alpha), (\gamma\to 0) and the distribution becomes numerically unstable and extremely skew/heavy-tailed.

SciPy caution: SciPy’s a and b are scaled by (\delta) (since (a=\alpha\delta), (b=\beta\delta)). So changing scale while holding a,b fixed changes ((\alpha,\beta)) as well.

6) Derivations#

We sketch three core derivations.

A) Expectation#

Start from the CGF:

[ K(t) = \mu t + \delta\Big(\gamma - \sqrt{\alpha^2-(\beta+t)^2}\Big). ]

Differentiate:

[ K’(t) = \mu + \delta;\frac{\beta+t}{\sqrt{\alpha^2-(\beta+t)^2}}. ]

Evaluating at (t=0) gives the mean:

[ \mathbb{E}[X] = K’(0) = \mu + \delta,\frac{\beta}{\gamma}. ]

B) Variance#

Differentiate again:

[ K’’(t) = \delta;\frac{\alpha^2}{\big(\alpha^2-(\beta+t)^2\big)^{3/2}}. ]

so

[ \mathrm{Var}(X) = K’’(0) = \delta;\frac{\alpha^2}{\gamma^3}. ]

C) Likelihood#

For i.i.d. data (x_1,\dots,x_n), the likelihood is

[ L(\alpha,\beta,\delta,\mu) = \prod_{i=1}^n f(x_i\mid\alpha,\beta,\delta,\mu), ]

and the log-likelihood is

[ \ell(\alpha,\beta,\delta,\mu) = \sum_{i=1}^n \log f(x_i\mid\alpha,\beta,\delta,\mu). ]

Because the density involves (K_1(\cdot)), the MLE does not have a simple closed form; numerical optimization is typical. In SciPy you can use norminvgauss.fit (MLE under the loc/scale convention).

def nig_loglik(alpha: float, beta: float, delta: float, mu: float, x: np.ndarray) -> float:
    return float(np.sum(nig_logpdf(x, alpha, beta, delta, mu)))


alpha, beta, delta, mu = 2.5, 0.8, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)

x = rv.rvs(size=2_000, random_state=rng)
ll_true = nig_loglik(alpha, beta, delta, mu, x)

# Compare to a slightly misspecified parameter (lower beta)
ll_alt = nig_loglik(alpha, beta * 0.7, delta, mu, x)

{"loglik_true": ll_true, "loglik_alt": ll_alt, "diff": ll_true - ll_alt}
{'loglik_true': -2185.389754329737,
 'loglik_alt': -2212.5723866013323,
 'diff': 27.18263227159514}

7) Sampling & Simulation (NumPy-only)#

The mixture representation gives a simple sampler.

Step 1: sample the inverse Gaussian mixing variable#

We use the ((\nu,\lambda)) parameterization with density

[ f(v;\nu,\lambda) = \sqrt{\frac{\lambda}{2\pi v^3}} \exp\left(-\frac{\lambda(v-\nu)^2}{2\nu^2 v}\right), \qquad v>0. ]

For NIG, we need

[ V \sim \text{IG}\left(\nu=\frac{\delta}{\gamma},;\lambda=\delta^2\right). ]

A classic exact sampler is the Michael–Schucany–Haas method.

Step 2: sample the conditional normal#

Given (V=v):

[ X\mid V=v \sim \mathcal{N}(\mu+\beta v,; v). ]

So we can generate

[ X = \mu + \beta V + \sqrt{V},Z,\qquad Z\sim\mathcal{N}(0,1). ]

Everything below uses NumPy only.

def sample_invgauss_msh(size: int, nu: float, lam: float, rng: np.random.Generator) -> np.ndarray:
    '''Sample IG(nu, lam) using the Michael–Schucany–Haas method.

    Parameterization: mean = nu, shape = lam.
    '''
    if not (nu > 0):
        raise ValueError("nu must be > 0")
    if not (lam > 0):
        raise ValueError("lam must be > 0")

    v = rng.normal(size=size)
    y = v * v

    nu2 = nu * nu

    x = (
        nu
        + (nu2 * y) / (2.0 * lam)
        - (nu / (2.0 * lam)) * np.sqrt(4.0 * nu * lam * y + nu2 * y * y)
    )

    u = rng.uniform(size=size)
    return np.where(u <= nu / (nu + x), x, nu2 / x)


def sample_nig(size: int, alpha: float, beta: float, delta: float, mu: float, rng: np.random.Generator) -> np.ndarray:
    nig_validate(alpha, beta, delta)
    gamma = math.sqrt(alpha * alpha - beta * beta)
    nu = delta / gamma
    lam = delta * delta

    v = sample_invgauss_msh(size=size, nu=nu, lam=lam, rng=rng)
    z = rng.normal(size=size)
    return mu + beta * v + np.sqrt(v) * z


# Quick simulation check: sample moments vs formulas
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
x = sample_nig(size=200_000, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)

{
    "sample_mean": float(x.mean()),
    "theory_mean": nig_mean(alpha, beta, delta, mu),
    "sample_var": float(x.var(ddof=0)),
    "theory_var": nig_var(alpha, beta, delta),
}
{'sample_mean': 0.13341510615123034,
 'theory_mean': 0.13565855667130944,
 'sample_var': 0.7147789033673897,
 'theory_var': 0.7160715875654602}

8) Visualization (PDF, CDF, Monte Carlo)#

We’ll visualize:

  • how changing (\alpha) and (\beta) affects tail heaviness and skewness

  • the PDF and CDF for a chosen parameter set

  • a Monte Carlo histogram + empirical CDF compared to the theoretical curves

def plot_nig_pdfs(param_sets: list[dict], q_low: float = 0.001, q_high: float = 0.999) -> go.Figure:
    from scipy.stats import norminvgauss

    fig = go.Figure()

    for ps in param_sets:
        alpha, beta, delta, mu = ps["alpha"], ps["beta"], ps["delta"], ps["mu"]
        a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
        rv = norminvgauss(a, b, loc=loc, scale=scale)

        xs = np.linspace(rv.ppf(q_low), rv.ppf(q_high), 600)
        ys = nig_pdf(xs, alpha, beta, delta, mu)

        label = f"α={alpha:g}, β={beta:g}, δ={delta:g}, μ={mu:g}"
        fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name=label))

    fig.update_layout(title="Normal Inverse Gaussian PDFs", xaxis_title="x", yaxis_title="f(x)")
    return fig


param_sets = [
    {"alpha": 3.0, "beta": 0.0, "delta": 1.0, "mu": 0.0},  # symmetric, lighter tails
    {"alpha": 1.6, "beta": 0.0, "delta": 1.0, "mu": 0.0},  # symmetric, heavier tails
    {"alpha": 2.0, "beta": 0.7, "delta": 1.0, "mu": 0.0},  # right-skew
    {"alpha": 2.0, "beta": -0.7, "delta": 1.0, "mu": 0.0},  # left-skew
]

fig = plot_nig_pdfs(param_sets)
fig.show()
# PDF + CDF for one parameter set (CDF via SciPy)
from scipy.stats import norminvgauss

alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = norminvgauss(a, b, loc=loc, scale=scale)

xs = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 700)

fig_pdf = go.Figure()
fig_pdf.add_trace(go.Scatter(x=xs, y=rv.pdf(xs), mode="lines", name="pdf"))
fig_pdf.update_layout(title="NIG PDF", xaxis_title="x", yaxis_title="f(x)")

fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=xs, y=rv.cdf(xs), mode="lines", name="cdf"))
fig_cdf.update_layout(title="NIG CDF", xaxis_title="x", yaxis_title="F(x)")

fig_pdf.show()
fig_cdf.show()
# Monte Carlo samples vs PDF
alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)

n = 80_000
x = sample_nig(size=n, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)

xs = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 600)

fig = px.histogram(
    x,
    nbins=90,
    histnorm="probability density",
    title=f"Monte Carlo samples vs PDF (n={n:,})",
    labels={"value": "x"},
)
fig.add_trace(go.Scatter(x=xs, y=rv.pdf(xs), mode="lines", name="theoretical pdf"))
fig.update_layout(yaxis_title="density")
fig.show()
# Empirical CDF vs theoretical CDF
alpha, beta, delta, mu = 2.0, 0.7, 1.2, -0.3
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)
rv = stats.norminvgauss(a, b, loc=loc, scale=scale)

n = 30_000
x = sample_nig(size=n, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)

xs = np.sort(x)
ys = np.arange(1, n + 1) / n

xg = np.linspace(rv.ppf(0.001), rv.ppf(0.999), 600)

fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=xg, y=rv.cdf(xg), mode="lines", name="theoretical CDF"))
fig.update_layout(title="Empirical CDF vs theoretical CDF", xaxis_title="x", yaxis_title="F(x)")
fig.show()

9) SciPy Integration (scipy.stats.norminvgauss)#

SciPy provides:

  • pdf, logpdf

  • cdf, ppf (numerical)

  • rvs (sampling)

  • fit (MLE)

Remember the mapping:

[ a = \alpha\delta,; b=\beta\delta,; \text{loc}=\mu,; \text{scale}=\delta. ]

from scipy.stats import norminvgauss

alpha, beta, delta, mu = 2.2, 0.6, 1.4, -0.1
a, b, loc, scale = to_scipy_params(alpha, beta, delta, mu)

rv = norminvgauss(a, b, loc=loc, scale=scale)

# Basic API
x0 = np.array([-2.0, 0.0, 1.0])
out = {
    "pdf(x0)": rv.pdf(x0),
    "cdf(x0)": rv.cdf(x0),
    "mean": rv.mean(),
    "var": rv.var(),
}
out
{'pdf(x0)': array([0.008404, 0.519158, 0.28318 ]),
 'cdf(x0)': array([0.002845, 0.371315, 0.820179]),
 'mean': 0.29686269665968856,
 'var': 0.7145890817830703}
# Fit example: recover parameters from synthetic data
alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a_true, b_true, loc_true, scale_true = to_scipy_params(alpha, beta, delta, mu)

rv_true = norminvgauss(a_true, b_true, loc=loc_true, scale=scale_true)
data = rv_true.rvs(size=3_000, random_state=rng)

a_hat, b_hat, loc_hat, scale_hat = norminvgauss.fit(data)

{
    "true": (a_true, b_true, loc_true, scale_true),
    "hat": (a_hat, b_hat, loc_hat, scale_hat),
}
{'true': (2.6, 0.65, -0.2, 1.3),
 'hat': (3.080842815399469,
  0.6938365831228384,
  -0.21570215913180396,
  1.4409291715704364)}

10) Statistical Use Cases#

A) Hypothesis testing / goodness-of-fit#

In practice you often ask: Do my residuals / increments look NIG?

A common approach is a goodness-of-fit test (e.g. Kolmogorov–Smirnov).

Important caution: if you estimate parameters from the data and then run a standard KS test, the p-value is no longer valid (because the null distribution changed). A simple fix is a parametric bootstrap:

  1. fit parameters (\hat\theta)

  2. compute the test statistic on the observed data

  3. simulate many datasets from the fitted model

  4. refit + recompute the statistic for each simulated dataset

  5. estimate the p-value by the bootstrap tail probability

B) Bayesian modeling#

NIG is useful as a likelihood (or error model) when residuals are heavy-tailed and skewed.

The mixture representation introduces latent variables (V_i) such that

[ X_i \mid V_i \sim \mathcal{N}(\mu+\beta V_i,;V_i), \quad V_i \sim \text{IG}(\delta/\gamma,;\delta^2), ]

which can make Bayesian inference more tractable because the conditional model is Gaussian.

C) Generative modeling#

Because NIG is infinitely divisible, it is natural for increments. A simple generative model is a random walk with NIG innovations:

[ S_t = S_{t-1} + \varepsilon_t,\qquad \varepsilon_t \sim \text{NIG}(\alpha,\beta,\delta,\mu). ]

# A) Parametric bootstrap KS test demo (small B for speed)
# We'll generate data from a known NIG, fit it, and test goodness-of-fit.

from scipy.stats import kstest, norminvgauss

rng_local = np.random.default_rng(123)

alpha, beta, delta, mu = 2.0, 0.5, 1.3, -0.2
a_true, b_true, loc_true, scale_true = to_scipy_params(alpha, beta, delta, mu)
rv_true = norminvgauss(a_true, b_true, loc=loc_true, scale=scale_true)

n = 800
x = rv_true.rvs(size=n, random_state=rng_local)

# Fit the model
a_hat, b_hat, loc_hat, scale_hat = norminvgauss.fit(x)
rv_hat = norminvgauss(a_hat, b_hat, loc=loc_hat, scale=scale_hat)

# KS statistic on observed data against fitted CDF
ks_obs = kstest(x, rv_hat.cdf).statistic

# Parametric bootstrap: simulate from fitted, refit, recompute KS
B = 80
ks_boot = np.empty(B)
for i in range(B):
    xb = rv_hat.rvs(size=n, random_state=rng_local)
    a_b, b_b, loc_b, scale_b = norminvgauss.fit(xb)
    rv_b = norminvgauss(a_b, b_b, loc=loc_b, scale=scale_b)
    ks_boot[i] = kstest(xb, rv_b.cdf).statistic

p_boot = float(np.mean(ks_boot >= ks_obs))

{"ks_obs": float(ks_obs), "p_boot": p_boot, "B": B}
{'ks_obs': 0.01620866142668531, 'p_boot': 0.75, 'B': 80}
# C) Generative modeling: random walk with NIG vs normal innovations (matched mean/var)

T = 300
alpha, beta, delta, mu = 2.0, 0.5, 1.0, 0.0

eps_nig = sample_nig(size=T, alpha=alpha, beta=beta, delta=delta, mu=mu, rng=rng)

m = nig_mean(alpha, beta, delta, mu)
v = nig_var(alpha, beta, delta)
eps_norm = rng.normal(loc=m, scale=math.sqrt(v), size=T)

s0 = 0.0
s_nig = s0 + np.cumsum(eps_nig)
s_norm = s0 + np.cumsum(eps_norm)

fig = go.Figure()
fig.add_trace(go.Scatter(y=s_nig, mode="lines", name="NIG random walk"))
fig.add_trace(go.Scatter(y=s_norm, mode="lines", name="Normal random walk (matched mean/var)"))
fig.update_layout(title="Random walk paths", xaxis_title="t", yaxis_title="S_t")
fig.show()

# Compare tail behavior of innovations
qs = [0.001, 0.01, 0.5, 0.99, 0.999]
{
    "quantiles": qs,
    "nig": np.quantile(eps_nig, qs),
    "normal": np.quantile(eps_norm, qs),
}
{'quantiles': [0.001, 0.01, 0.5, 0.99, 0.999],
 'nig': array([-2.21934 , -1.787824,  0.211728,  2.341481,  2.964354]),
 'normal': array([-2.208528, -1.544096,  0.255063,  1.975488,  2.075406])}

11) Pitfalls#

  • Parameter constraints: you must have (\alpha>0), (\delta>0), and (|\beta|<\alpha). Near the boundary (|\beta|\approx\alpha), numerical issues are common.

  • Confusing names: norminvgauss (Normal Inverse Gaussian) is different from normal-inverse-gamma.

  • Numerical stability:

    • the PDF involves (K_1(z)), which can underflow for large (z)

    • prefer logpdf for inference and use scaled Bessel functions (e.g. scipy.special.kve) when implementing formulas

  • Fitting can be unstable:

    • small samples may not pin down tail parameters well

    • MLE may be sensitive to initialization / local optima

    • always validate fit quality with diagnostic plots (QQ plots, tail behavior)

12) Summary#

  • norminvgauss is a continuous distribution on (\mathbb{R}) that models skewed, heavy-tailed data.

  • Its key intuition is a normal mean–variance mixture with an inverse Gaussian mixing variable.

  • The PDF involves a modified Bessel function (K_1); the CDF is typically computed numerically.

  • The CGF/MGF gives compact formulas for mean/variance/skewness/kurtosis.

  • Sampling is straightforward via the mixture representation and can be done with NumPy only.

  • SciPy’s scipy.stats.norminvgauss provides pdf, cdf, rvs, and fit with a clear mapping to ((\alpha,\beta,\delta,\mu)).